*Author: Allison P. Harris
*Paper: Can Racial Diversity Among Judges Affect Sentencing Outcomes?
*Journal: American Political Science Review
*Created: 3/31/2023
*Code written and run using Stata/MP 16.1

clear all
/*Load raw sentencing data from Circuit Court of Cook County Criminal Division 
received in 2016
About 1500 rows in the data had to be manually adjusted by an RA due to Cook County 
data maintenance issues. The original and adjusted raw data are included.*/

*1401021 observations

*Renaming columns
rename v1 casenum
rename v2 in_date
rename v3 l_name
rename v4 dob
rename v5 race
rename v6 sex
rename v7 judge_name
rename v8 statute
rename v9 charge_des
rename v10 charge_class
rename v11 sentence
rename v12 disp_date
rename v13 min_sent
rename v14 max_sen

*Save

clear

/*Loading second batch of raw sentencing data from Circuit Court of Cook County 
Criminal Division received in 2017. These data are identical to first set, but 
include additional case information- 5 more columns.*/

*1347754 observations

*Renaming columns
rename v1 casenum 
rename v2 in_date 
rename v3 l_name 
rename v4 dob 
rename v5 race 
rename v6 sex 
rename v7 judge_name 
rename v8 courtroom 
rename v9 courthouse 
rename v10 city 
rename v11 trial_type 
rename v12 att_type 
rename v13 statute 
rename v14 charge_des 
rename v15 charge_class 
rename v16 fin_disp_2 
rename v17 disp_date 
rename v18 min_sent 
rename v19 max_sen

*Save data

*Merge the two raw datasets

*About 63k don't merge. Most of these are from the later dataset which includes courtroom and courthosue information. 

*Save merged data

*************************************
****Data exploration and cleaning****
*************************************

****Charge class
tab charge_class, m
sort charge_class

/*Data includes charge  class for which defendants were convicted and not for which they were charged. All original charges were felonies, since they were heard in the Criminal Division. Models will include measures for level of charge for which defendant was convicted. 
Charge classes for felonies are: M, X, 1, 2, 3, 4 and charge classes for misdemeanors are: A, B, C.
Will recode as missing, observations with non-missing charge class columns that indicate something other than those known charge classes. These account for very few observations. They seem to include a mix of petty offenses, traffic offenses, bond offenses, and also clerical/data entry errors.*/
replace charge_class = "" if (charge_class != "M" & charge_class != "X" & charge_class != "1"& charge_class != "2"& charge_class != "3"& charge_class != "4"& charge_class != "A"& charge_class != "B"& charge_class != "C")

tab charge_class, m

*Numeric charge class variable least to most serious: 1(C) --> 9(M)
gen chargeclass_num = .
replace chargeclass_num = 9 if charge_class == "M"
replace chargeclass_num = 8 if charge_class == "X"
replace chargeclass_num = 7 if charge_class == "1"
replace chargeclass_num = 6 if charge_class == "2"
replace chargeclass_num = 5 if charge_class == "3"
replace chargeclass_num = 4 if charge_class == "4"
replace chargeclass_num = 3 if charge_class == "A"
replace chargeclass_num = 2 if charge_class == "B"
replace chargeclass_num = 1 if charge_class == "C"
tab chargeclass_num charge_class, m

****Initiation Date
desc in_date
split in_date, parse (/)

rename in_date1 init_year
table init_year

rename in_date2 init_mon
table init_mon

rename in_date3 init_day
table init_day


*Making initiation year, month, and day variables numeric
destring init_year, replace
destring init_mon, replace
destring init_day, replace

*Dropping all cases initiated before 1995 since the randomized assignment of criminal cases to judges cannot be confirmed prior to 1995, and the data do not appear to be as complete prior to 1995 this could be do to outdated and retired systems from that period
tab init_year, m
drop if init_year < 1995
*Dropped 114397 observations

*Excluding all cases initiated after 2013 to allow all cases included in the data 
*to have had time to come to a final disposition and for court to have all information enterred-- years after 2013 appear incomplete. Data were requested in May 2015 and received in February 2016, then updated in November 2017. 
drop if init_year > 2013
*Dropped 22566 observations
tab init_year, m

*Replacing the initiation month as missing in the few cases where the initiation month > 12
tab init_mon
replace init_mon = . if init_mon > 12

*Replacing the few cases whre initiaion day is coded as greater than 31 as missing.
tab init_day, m
replace init_day = . if init_day > 31


****Defendant DOB
desc dob
split dob, parse (/)

rename dob1 dob_year
rename dob2 dob_month
rename dob3 dob_day

*Making dob_year numeric
destring dob_year, replace
desc dob_year
tab dob_year, m
*DOB appears to be recorded as 0000/00/00 when the defendant's DOB is unknown. Recoding these as missing.
replace dob_year = . if dob == "0000/00/00"

tab dob_year, m
*There are some observations with impossible/improbable birth years for defendants (years that had not yet occured, years that make no sense, and years where the defendant would be too young or too old). Will mark these ase missing.
replace dob_year = . if (dob_year < 1923 | dob_year > 1998)
tab dob_year, m

****Defendant approximate age at time of case
gen def_age = init_year - dob_year
summ def_age
/*For 1k or so of these the defendant YOB and inititation year are the same or very close suggesting a coding error on the part of the court.*/

****Defendant gender
replace sex = trim(sex)
gen d_female = .
replace d_female = 1 if sex == "FEMALE"
replace d_female = 0 if sex == "MALE"
tab sex d_female, m

****Defendant race
tab race, m

*renaming race variable to specify defendant's race
rename race d_race
replace d_race = trim(d_race)

*Generating new race variable with fewer categories
generate def_race = .

*def_race = 1 for Black defendants
replace def_race = 1 if d_race == "BLACK"
replace def_race = 1 if d_race == "AFRICAN-AMERICAN"
replace def_race = 1 if d_race == "BLACK HISPANIC"

*def_race = 2 for White defendants
replace def_race = 2 if d_race == "WHITE"

*def_race = 3 for Latino/Hispanic defendants
replace def_race = 3 if d_race == "MEXICAN-AMERICAN"
replace def_race = 3 if d_race == "SPANISH-AMERICAN"
replace def_race = 3 if d_race == "WHITE HISPANIC"
replace def_race = 3 if d_race == "LATIN"
replace def_race = 3 if d_race == "PUERTO-RICAN"

*def_race = 4 for all other racial and ethnic groups the Cook County coded as follows:
replace def_race = 4 if d_race == "AMERICAN INDIAN"
replace def_race = 4 if d_race == "ARAB"
replace def_race = 4 if d_race == "ORIENTAL"
replace def_race = 4 if d_race == "CHINESE"
*def_race coded as missing when Cook County recorded defendant's race as "NOT FOUND" or "OTHER/UNKNOWN i

*Coding dummy variables for each racial group
tabulate def_race, generate(dum)
rename dum1 black
rename dum2 white
rename dum3 lat_hisp
rename dum4 other

****Mininum Sentence 
/*The Cook County data for minimum sentence is provided in the following format: 
00000000, where the first three digits are the number of years, the next two digits are the number of months, and the final three digits are the number of days. The following code extracts this information to create three new variables: number of years of the minimum sentence, number of months of the minimum sentence, and number of days of the minimum sentence. Then code one new minimum sentence variable measured in days and one in years.*/
generate minsen_yrs = substr(min_sent, 1,3)
generate minsen_mos = substr(min_sent, 4,2)
generate minsen_days = substr(min_sent, 6,8)

*Converting to numeric
destring minsen_yrs minsen_mos minsen_days, generate(minyrs minmos mindays)
desc minyrs minmos mindays
summ minyrs minmos mindays

*Combining information into one variable that measures the minimum sentence in days and one in years
generate days_minsen = (minyrs * 365) + (minmos * 30) + (mindays)
summarize days_minsen
gen yrs_minsen = (days_minsen/365)
summarize yrs_minsen

/*Some quick summaries and tables of minimum felony sentences by race.*/
summarize days_minsen if black == 1
summarize days_minsen if white == 1
summarize days_minsen if lat_hisp == 1
summarize days_minsen if other == 1

****Judge Name (Judge name cleaning and recoding by Jorge)
replace judge_name = trim(judge_name) 

gen lastname = regexs(1) if regexm(judge_name, "^([A-Z]+) ")
/* This line means: "if judge_name has the structure defined in this regular expression, make lastname equal
to the part of the structure within the parenthesis in the regular expression. In other words: If judge_name has a full word ([A-Z]+ ) at the beggining (^)
of the string, then make lastname equal that word.*/
replace lastname = regexs(1) if regexm(judge_name, "^([A-Z]+),")
/* This line means: "If judge_name has a full word ([A-Z]+) followed by a comma at the beggining (^)
of the string, then make lastname equal that word.*/
replace lastname = regexs(1) if regexm(judge_name, "^(O'[A-Z]+)")
/* This line means: "If judge_name starts (^) with an O followed by an apostrophe followed by a set of capital
letters(O'[A-Z]+) then make lastname equal that word. This is for last names such as O'BRIEN*/
replace lastname = regexs(1) + "-" + regexs(2) if regexm(judge_name, "^([A-Z]+)\-([A-Z]+)") 
//This is for hyphenated last names (Word2 + "-" Word2)
replace lastname = "" if regexm(judge_name, "NO JUDGE ASSIGNED") 
//This line means: "make lastname equal to "" if judge_name contains 'NO JUDGE ASSIGNED'"
*removed line of code here that references one judge's name 

gen firstname = regexs(2) if regexm(judge_name, "^([A-Z]+), ([A-Z]+ [A-Z]\.)") 
/* This line means: "If judge_name starts (^) with a set of capital letters ([A-Z]+)
followed by a comma followed by a space, followed by set of capital
letters([A-Z]+), a space, a capital letter ([A-Z]), and a period (\.), then make lastname equal to the contents of the second parenthesis [regexs(2)]. The contentes would then be
([A-Z]+ [A-Z]\.), meaning the second word the letter that follows it, and the period. This would capture the first name and middle initial.*/
replace firstname = regexs(2) if regexm(judge_name, "^([A-Z]+) ([A-Z]+ [A-Z])$")
replace firstname = regexs(2) if regexm(judge_name, "^([A-Z]+) ([A-Z]+ [A-Z]\.)")
replace firstname = regexs(2) if regexm(judge_name, "^([A-Z]+), ([A-Z]+) JR\.")
replace firstname = regexs(2) if regexm(judge_name, "^([A-Z]+), ([A-Z]+)$")
replace firstname = regexs(2) if regexm(judge_name, "^([A-Z]+), ([A-Z]+ [A-Z]+)$")
replace firstname = regexs(2) if regexm(judge_name, "^([A-Z]+) ([A-Z]+ [A-Z]+)$")
replace firstname = regexs(2) if regexm(judge_name, "^(O'[A-Z]+) (.)")
replace firstname = regexs(2) if regexm(judge_name, "^(O'[A-Z]+), (.)")
replace firstname = regexs(3) if regexm(judge_name, "^([A-Z]+)\-([A-Z]+) (.)")
replace firstname = regexs(3) if regexm(judge_name, "^([A-Z]+)\-([A-Z]+), (.)")
replace firstname = regexs(2) if regexm(judge_name, "^([A-Z]+) ([A-Z]+ [A-Z]+) JR")
replace firstname = regexs(1) if regexm(judge_name, "JR., ([A-Z]+ [A-Z]+\.)")
replace firstname = regexs(2) if regexm(judge_name, "^([A-Z]+),([A-Z]+ [A-Z]\.) JR")
*Removed four lines of code here that reference judges' names

replace firstname = "" if judge_name == "NO JUDGE ASSIGNED"

replace firstname = firstname + "." if regexm(firstname, " [A-Z]$")
replace firstname = proper(firstname) 
//This converts the character string firstname into proper capitalization.

order firstname lastname, after (judge_name)

*Save data  
/*Merge in biographical information from Chicago Council of Lawyers report and other data collection efforts from Chicago Council of Lawyers staff.*/


/*Many lines of code redacted here for the manual coding for judge race and/or gender for about 400 judges by name. This was done with Internet searches priotizing mentions of race or gender in interviews or articles about the judge, followed by affiliations with professional or social organizations affiliated with particular racial/ethnic or gender groups, and using pictures and last names as a last resort.*/

**Coding judge race and gender dummy variables
tab j_race 

gen j_black = .
replace j_black = 1 if j_race == "B"
replace j_black = 0 if (j_race == "A"| j_race == "H"| j_race == "W")
tab j_black j_race, m

gen j_lat = .
replace j_lat = 1 if j_race == "H"
replace j_lat = 0 if (j_race == "A"| j_race == "B"| j_race == "W")
tab j_lat j_race, m

gen j_asian = .
replace j_asian = 1 if j_race == "A"
replace j_asian = 0 if (j_race == "H"| j_race == "B"| j_race == "W")
tab j_asian j_race, m

gen j_white = .
replace j_white = 1 if j_race == "W"
replace j_white = 0 if (j_race == "A"| j_race == "H"| j_race == "B")
tab j_white j_race, m

**Gender
tab j_sex, m
gen j_male = .
replace j_male = 1 if j_sex == "M"
replace j_male = 0 if j_sex == "F"
tab j_male j_sex, m

gen j_fem = .
replace j_fem = 1 if j_sex == "F"
replace j_fem = 0 if j_sex == "M"
tab j_fem j_sex, m

****Courthouse and city
replace courthouse = trim(courthouse)
replace city = trim(city)
tab courtroom, m

*Leighton and suburban courthouse dummies
gen leighton = 1 if city == "CHICAGO"
replace leighton = 0 if (city == "SKOKIE"| city == "BRIDGEVIEW")
gen sko = 1 if city == "SKOKIE"
replace sko = 0 if (city == "CHICAGO"| city == "BRIDGEVIEW")
gen bri = 1 if city == "BRIDGEVIEW"
replace bri = 0 if (city == "CHICAGO"| city == "SKOKIE")

****Sentence/Final Disposition
table sentence, m
replace sentence = trim(sentence)

*Dummy variables for final dispositions that will be dependent variables 

*Sentenced to IL DOC, prison in general, and custody of the state
generate il_doc = 0
replace il_doc = 1 if sentence == "DEF SENTENCED ILLINOIS DOC"
replace il_doc = 1 if sentence == "DEF SENT TO LIFE IMPRISONMENT"
replace il_doc = 1 if sentence == "IDOC AND FINE"
replace il_doc = 1 if sentence == "IDOC  -  TIME SERVED"
replace il_doc = 1 if sentence == "ORDER OF CRT, STAY OF MITTIMUS"
replace il_doc = 1 if sentence == "PERIODIC IMP - IDOC"
replace il_doc = 1 if sentence == "IMPRISIONMENT FOR CONTEMPT"
replace il_doc = 1 if sentence == "SENT PER IMP + OTH DISC COND"
replace il_doc = 1 if sentence == "VIOL PROB - IDOC"
replace il_doc = . if sentence == ""
tab il_doc, m

/*Sentenced to Cook County DOC (jail)- this includes any case where the defendant was sentenced to CCDOC including those where that was in addition to something  lesser such as probation or in addition to time served.*/
generate cc_doc = 0
replace cc_doc = 1 if sentence == "DEF SENTENCED TO COOK CNTY DOC"
replace cc_doc = 1 if sentence == "DEF SENT PROB AND CCDOC"
replace cc_doc = 1 if sentence == "DEF SENT CCDOC, FOR PART SENT"
replace cc_doc = 1 if sentence == "DEF SENT CCDOC, PERIODIC IMP"
replace cc_doc = 1 if sentence == "PROB, JAIL, AND FINE"
replace cc_doc = 1 if sentence == "PERIODIC IMP AND FINE - CCDOC"
replace cc_doc = 1 if sentence == "DEF SENT CCDOC, TIME CON SRVD"
replace cc_doc = 1 if sentence == "PROB JAIL & TIME CONS SERVED"
replace cc_doc = 1 if sentence == "VIOL PROB - CCDOC"
replace cc_doc = . if sentence == ""
tab cc_doc, m

*Illinois or Chicago DOC
generate doc = 0
replace doc = 1 if (il_doc == 1| cc_doc == 1)
replace doc = . if sentence == ""
tab doc, m

*Nolle Prosequi/charge dropped or dismissed by prosecution
generate dismiss = 0
replace dismiss = 1 if sentence == "NOLLE PROSEQUI"
replace dismiss = 1 if sentence == "NO ORDER ON COUNT"
replace dismiss = 1 if sentence == "STRICKEN OFF - LEAVE REINSTATE"
replace dismiss = . if sentence == ""
tab dismiss, m

*****Generating a defendant ID
egen def_id = group (l_name dob sex)

*Save data

/*Write out dataset to identify the defendant's number of felony offenses at the time of a case. Note: can only do this within timespan of the data.*/ 
duplicates drop def_id init_year init_mon init_day, force 
order def_id l_name dob init_year init_mo init_day
sort def_id init_year init_mo init_day
quietly by def_id:  gen defdup = cond(_N==1,0,_n)
order def_id l_name defdup init_year init_mo init_day

*New variable identifying the number of the defendant's felony offense
gen fel_num = defdup

*Recoding first offense number as 1 for defendants who only appear in data once
replace fel_num = 1 if defdup == 0
order def_id defdup fel_num init_year init_mo init_day casenum l_name

*Checking
gen init_date = mdy(init_mo, init_day, init_year)
sort def_id init_date

*Keeping what's needed to merge the back to main dataset
keep def_id fel_num init_year init_mo init_day
*Save dataset to merge back and identify previous charges
clear

*Load main dataset
*Merge with the datset created above to help identify previous charges

*Dummy variable indicating whether defendant has any previous felony cases
gen prev_charge = 0 if fel_num == 1
replace prev_charge = 1 if fel_num > 1


****Racial diversity of the bench
*Making a numeric judge name variable identifier
egen ident_judge = group (judge_name)

/*Courthouse location variables These will be used to identify the Division's permanent judges. 101 is the chief judge's courtroom and regular trial are not heard there.*/
*Courtroom
replace courtroom = trim(courtroom)
tab courtroom, m

/*Dropping the Clerk's office (where there is no judge assigned in vast majority of cases) and missing judge observations*/
drop if courtroom == "CLERK'S OFFICE" 
drop if courtroom == "NOT FOUND"

generate courtnum = substr(courtroom, 1,3)
destring courtnum, replace
tab courtnum, m

*Floor of courthouse
generate courtfl = substr(courtroom, 1,1)
destring courtfl, replace
tab courtfl, m

/*For the most part judges remain in the same court room for at least one year 
and usually longer-permandent judges. In some instances judges hear cases in court rooms other than their usual court room, and floating or other judges fill in for permanent Division judges during absences etc. Cases are randomly assigned to courtrooms, and therefore, permenent judges. These are the judges who will interact as colleagues on the court and among whom the level of racial diversity on the court is most likely to matter.

Identifying the judge who appears most often in each courtroom each year.*/ 

sort init_year city courtnum ident_judge
by init_year city courtnum ident_judge: egen j_mode = count (ident_judge)
order ident_judge judge_name init_year courtnum j_mode

sort init_year city courtnum j_mode ident_judge
gen n_jmode = -j_mode
order judge_name init_year city courtnum j_mode n_jmode

sort init_year city courtnum n_jmode ident_judge
bysort init_year city courtnum: egen maxj = max(j_mode)
order judge_name init_year city courtnum j_mode maxj
gen perm_judge = 0
replace perm_judge = 1 if j_mode == maxj
order judge_name init_year city courtnum j_mode maxj perm_judge

*Dropping the ordering variables
drop j_mode maxj n_jmode

*Save data 

*Pulling out pernendent judges to identify race and gender then merge back
keep if perm_judge == 1

*Permanent judge identifier to generate annual diversity measure
egen ann_judgeid = group (judge_name)

*Merge in additional judicial biographical information, including past electoral outcomes.

/*Race of judge to whose courtroom case is assigned (j_race etc. are of sentencing judge) for the annual racial diversity measures.*/
gen court_race = j_race 

gen crb = . 
replace crb = 1 if court_race == "B"
replace crb = 0 if court_race == "W"| court_race == "H"
tab crb court_race, m

gen crw = . 
replace crw = 1 if court_race == "W"
replace crw = 0 if court_race == "B"| court_race == "H"

gen crh = .
replace crh = 1 if court_race == "H"
replace crh = 0 if court_race == "B"| court_race == "W"

gen crfem = .
replace crfem = 1 if j_fem == 1
replace crfem = 0 if j_fem == 0

****Annual bench diversity measures
*All courthouses
bys init_year: egen numbj = total (crb)
by init_year: gen propbl =numbj/_N
summ propbl 
gen perpropbl = propbl * 100

bys init_year: egen numfem = total (crfem)
by init_year: gen propfem =numfem/_N
summ propfem 
gen perpropfem = propfem * 100

*Chicago
bys init_year: egen numbj_chi = total (crb) if city == "CHICAGO"
by init_year: gen propbl_chi =numbj_chi/_N if city == "CHICAGO"
summ propbl_chi
gen perpropbl_chi = propbl_chi * 100

bys init_year: egen numfem_chi = total (crfem) if city == "CHICAGO"
by init_year: gen propfem_chi =numfem_chi/_N if city == "CHICAGO"
summ propfem_chi
gen perpropfem_chi = propfem_chi * 100 
 
*Both suburban courthouses
bys init_year: egen numbj_sub = total (crb) if city != "CHICAGO"
by init_year: gen propbl_sub =numbj_sub/_N if city !="CHICAGO"
summ propbl_sub
gen perpropbl_sub = propbl_sub * 100

bys init_year: egen numfem_sub = total (crfem) if city != "CHICAGO"
by init_year: gen propfem_sub =numfem_sub/_N if city != "CHICAGO"
summ propfem_sub 
gen perpropfem_sub = propfem_sub * 100
 
*Skokie
bys init_year: egen numbj_sko = total (crb) if city == "SKOKIE"
by init_year: gen propbl_sko =numbj_sko/_N if city == "SKOKIE"
summ propbl_sko
gen perpropbl_sko = propbl_sko * 100

bys init_year: egen numfem_sko = total (crfem) if city == "SKOKIE"
by init_year: gen propfem_sko =numfem_sko/_N if city == "SKOKIE"
summ propfem_sko 
gen perpropfem_sko = propfem_sko * 100

*Bridgeview
bys init_year: egen numbj_bri = total (crb) if city == "BRIDGEVIEW"
by init_year: gen propbl_bri =numbj_bri/_N if city == "BRIDGEVIEW"
summ propbl_bri
gen perpropbl_bri = propbl_bri * 100

bys init_year: egen numfem_bri = total (crfem) if city == "BRIDGEVIEW"
by init_year: gen propfem_bri =numfem_bri/_N if city == "BRIDGEVIEW"
summ propfem_bri
gen perpropfem_bri = propfem_bri * 100

*Total judges per floor in Chicago
egen total1_chi=total(courtfl==1) if city=="CHICAGO", by(init_year)
egen total2_chi=total(courtfl==2) if city=="CHICAGO", by(init_year)
egen total3_chi=total(courtfl==3) if city=="CHICAGO", by(init_year)
egen total4_chi=total(courtfl==4) if city=="CHICAGO", by(init_year)
egen total5_chi=total(courtfl==5) if city=="CHICAGO", by(init_year)
egen total6_chi=total(courtfl==6) if city=="CHICAGO", by(init_year)
egen total7_chi=total(courtfl==7) if city=="CHICAGO", by(init_year)

*Total Black judges per floor
egen totalbfl_chi= total(j_race=="B") if city=="CHICAGO", by(courtfl init_year)

*Keep identifiers and diversity measures 

*Save data

*Merge with main dataset 

*Monthy diversity measures
sort init_year init_mo city courtnum ident_judge
by init_year init_mo city courtnum ident_judge: egen j_mode = count (ident_judge)
order ident_judge judge_name init_year init_mo city courtnum j_mode

sort init_year init_mo city courtnum j_mode ident_judge
gen n_jmode = -j_mode

sort init_year init_mo city courtnum n_jmode ident_judge
order ident_judge judge_name init_year init_mo city courtnum j_mode n_jmode
bysort init_year init_mo city courtnum: egen maxj = max(j_mode)
order ident_judge judge_name init_year init_mo city courtnum j_mode n_jmode maxj
gen perm_judge2 = 0
replace perm_judge2 = 1 if j_mode == maxj
order ident_judge judge_name init_year init_mo city courtnum j_mode n_jmode maxj perm_judge2

*Checking overlap between annual and monthy permanent judges 
tab perm_judge perm_judge2
*mostly overlapping

*Dropping the ordering variables that are no longer necessary
drop j_mode maxj n_jmode

*Save data

*Pulling out permanent judges for monthy measure
keep if perm_judge2 == 1

egen mo_judgeid = group (judge_name)

duplicates drop init_year init_mo courtnum city, force

gen court_race2 = j_race
gen crb2 = j_black
gen crw2 = j_white
gen crh2 = j_lat
gen cra2 = j_asian

bys init_year init_mo: egen numbjmon = total (crb2)
by init_year init_mo: gen propbmon =numbjmon/_N
summ propbmon 
gen perpropbmon = propbmon * 100

bys init_year init_mo: egen numbjmon_chi = total (crb2) if city == "CHICAGO"
by init_year init_mo: gen propbmon_chi =numbjmon_chi/_N if city == "CHICAGO"
summ propbmon_chi
gen perpropbmon_chi = propbmon_chi * 100

bys init_year init_mo: egen numbjmon_sub = total (crb2) if city != "CHICAGO"
by init_year init_mo: gen propbmon_sub =numbjmon_sub/_N if city != "CHICAGO"
summ propbmon_sub
gen perpropbmon_sub = propbmon_sub * 100

bys init_year init_mo: egen numbjmon_sko = total (crb2) if city == "SKOKIE"
by init_year init_mo: gen propbmon_sko =numbjmon_sko/_N if city == "SKOKIE"
summ propbmon_sko
gen perpropbmon_sko = propbmon_sko * 100

bys init_year init_mo: egen numbjmon_bri = total (crb2) if city == "BRIDGEVIEW"
by init_year init_mo: gen propbmon_bri =numbjmon_sub/_N if city == "BRIDGEVIEW"
summ propbmon_bri
gen perpropbmon_bri = propbmon_bri * 100

*Keep identifiers and diversity measures 

*Save data

*Merge with main dataset 


****Date and time variables
*Initiation date
gen trial_indate = mdy(init_mo, init_day, init_year)

*Initiation month year measure
gen initmo = ym(init_year, init_mo)
format initmo %tmMonthYY

*Year trend variable
gen ytrend = .
replace ytrend = 1 if init_year == 1995
replace ytrend = 2 if init_year == 1996
replace ytrend = 3 if init_year == 1997
replace ytrend = 4 if init_year == 1998
replace ytrend = 5 if init_year == 1999
replace ytrend = 6 if init_year == 2000
replace ytrend = 7 if init_year == 2001
replace ytrend = 8 if init_year == 2002
replace ytrend = 9 if init_year == 2003
replace ytrend = 10 if init_year == 2004
replace ytrend = 11 if init_year == 2005
replace ytrend = 12 if init_year == 2006
replace ytrend = 13 if init_year == 2007
replace ytrend = 14 if init_year == 2008
replace ytrend = 15 if init_year == 2009
replace ytrend = 16 if init_year == 2010
replace ytrend = 17 if init_year == 2011
replace ytrend = 18 if init_year == 2012
replace ytrend = 19 if init_year == 2013
 
/*Merge in party ID data collected by RAs from Cook County election results and campaing websites in July 2022*/

/*Merge in turnover data coded bt RA in 2022 in cookcounty_recoding.R.*/

****Additinoal variables needed for tables and figures
*Cases per judge (which is by courtroom) per year at each courthouse
bysort city init_year courtnum: gen j_casepyr=_N
order city init_year courtnum j_casepyr
bysort city: summ j_casepyr, detail

*defendant age over 30 dummy
gen defage_dummy = 1 if def_age > 30
replace defage_dummy = 0 if def_age < 31
bysort city: summ defage_dummy

****Dropping cases that should not be included in analysis
/*Dropping cases where courtroom indicates something other than a trial courtroom as the types of decisions made in cases heard in these courtrooms likely differ from those when cases are initially assigned to judges.
Also dropping courtroom 101, which according to court staff is the presiding judge's courtroom (raw data confirms this with the judge most often assigned to this courtroom). However, court staff say that the presiding judge usually handles pretrial and other administrative issues in this courtroom. It is difficult to parse these from potential regular cases that might be heard in this courtroom. It is therefore not clear the kind of decisions being made in this courtroom.*/
tab courtroom
drop if courtnum == 101
drop if courtroom == "102      EXPUNGE/SEAL CALENDAR"
drop if courtroom == "405      APPEAL COURT"

*Change capitalization of city variable
replace city = strproper(city)

***********************************************************
**** Deidentified Dataset
***********************************************************
*Saving two final datasets de-identified datasets
*1- includes class X and M felonies and dismissals, which are not included in the main analysis
cleandata_dismissals_x_m.dta 

*2- is the dataset used in the manuscript and for most of the appendix
cleandata_main.dta 

use cleandata_main.dta

*Figure 1: Criminal Division Annual Incarceration Rate
graph bar (mean) doc, over(init_year) scheme(s1mono) ylabel(0 .2 "20" .4 "40" .6 "60" .8 "80")

*Figure 2: Percent of Criminal Division Judges Who are Black and White
graph bar (mean) crb crw, over(init_year) ylabel(0 .2 "20" .4 "40" .6 "60" .8 "80") scheme(s1mono)

*Table 1: Case Characteristics across Locations
eststo clear
eststo descriptives: estpost tabstat black female defage_dummy chargeclass_num doc, by(city) statistics(mean sd) columns(statistics) listwise
esttab descriptives

*Table 2: Cases Per Judge Per Year
eststo clear
eststo jcases: estpost tabstat j_casepyr, by (city) statistics(mean median min max) listwise 
esttab jcases 

*Table 3: Case Characteristics By Judge Race
eststo clear
eststo jcasedesc: estpost tabstat black female age chargeclass_num, by(crb) statistics(mean sd) casewise columns(statistics) listwise
esttab 
eststo clear

***********************************************************
**** Main Text Analysis 
***********************************************************

****Unit correlations
xtset ann_judgeid 
xtreg doc perpropbl_chi black other age female ytrend if leighton == 1, fe 
predict unit, u
corr unit doc perpropbl_chi

****Models in manuscript
*Table 4: Racial Diversity and Incarceration across Courthouses
reg doc perpropbl black lat_hisp other age female chargeclass_num retention ytrend, vce (cl ann_judgeid)
reg doc perpropbl black lat_hisp other age female chargeclass_num retention ytrend if city == "Bridgeview", vce (cl ann_judgeid)
reg doc perpropbl black lat_hisp other age female chargeclass_num retention ytrend if city == "Skokie", vce (cl ann_judgeid)
reg doc perpropbl black lat_hisp other age female chargeclass_num retention ytrend if city == "Chicago", vce (cl ann_judgeid)

*Table 5: Racial Diversity, Incarceration, and Sentence Length in Chicago
xtset ann_judgeid
reg doc perpropbl_chi black lat_hisp other age female crb crh crfem j_pid last_year chargeclass_num retention ytrend if leighton == 1, vce (cl ann_judgeid)
xtreg doc perpropbl_chi black lat_hisp other age female  chargeclass_num retention ytrend if leighton == 1, fe vce (cl ann_judgeid)
xtreg doc perpropbl_chi black black#c.perpropbl_chi lat_hisp other age female  chargeclass_num retention ytrend if leighton == 1, fe vce (cl ann_judgeid)
xtreg days_minsen perpropbl_chi black lat_hisp other age female chargeclass_num retention ytrend if (leighton == 1), fe vce (cl ann_judgeid)
xtreg days_minsen perpropbl_chi black c.perpropbl_chi#black lat_hisp other age female  chargeclass_num retention ytrend if (leighton == 1), fe vce (cl ann_judgeid)

*Predicted probabilities referenced in text
xtreg doc perpropbl_chi black black#c.perpropbl_chi lat_hisp other age female  chargeclass_num retention ytrend if leighton == 1, fe vce (cl ann_judgeid)
margins black, at (perpropbl_chi=(10 20))

*Table 6: Racial Diversity, Incarceration, and Judge Race in Chicago
reg doc perpropbl_chi black lat_hisp other age female chargeclass_num retention ytrend if (leighton == 1 & crw == 1), vce (cl ann_judgeid)
xtset ann_judgeid
xtreg doc perpropbl_chi black lat_hisp other age female chargeclass_num retention ytrend if (leighton == 1 & crw == 1), fe vce (cl ann_judgeid)
xtreg doc perpropbl_chi black c.perpropbl_chi#black lat_hisp other age female chargeclass_num retention ytrend if (leighton == 1 & crw == 1), fe vce (cl ann_judgeid)
reg doc perpropbl_chi black lat_hisp other age female chargeclass_num retention ytrend if (leighton == 1 & crb == 1), vce (cl ann_judgeid)
xtreg doc perpropbl_chi black lat_hisp other age female  chargeclass_num retention ytrend if (leighton == 1 & crb == 1), fe vce (cl ann_judgeid)
xtreg doc perpropbl_chi black c.perpropbl_chi#black lat_hisp other age female  chargeclass_num retention ytrend if (leighton == 1 & crb == 1), fe vce (cl ann_judgeid)

*Table 7: Predicted probability of incarceration by defendant race for cases heard by White judges.
summ perpropbl_chi, detail
xtreg doc perpropbl_chi black c.perpropbl_chi#black lat_hisp other age female  chargeclass_num retention ytrend if (leighton == 1 & crw == 1), fe vce (cl ann_judgeid)
margins black, at (perpropbl_chi=(8 15 20.5 22.5))

***********************************************************
**** Appendix 
***********************************************************

*Figure A.1: Number of Felony Cases Initiated per Year in the Circuit Court of Cook County, Criminal Division
graph bar (count), over(init_year) scheme(s1mono)

*Table A1: Distribution of Cases by Defendant Race
tab def_race, m

*Table A.2: Distribution of Cases by Charge Class from Most to Least Serious
tab charge_class, m

*Table A.3: Characteristics of Criminal Division Judges
tab court_race, m
tab crfem, m

*Table B.1: Racial Diversity and Incarceration (Including Dismissed Charges).
clear 
use cleandata_dismissals_x_m

drop if charge_x == 1
drop if charge_m == 1

xtset ann_judgeid
xtreg doc perpropbl_chi black black#c.perpropbl_chi lat_hisp other age female  chargeclass_num retention ytrend if leighton == 1, fe vce (cl ann_judgeid)

clear 
use cleandata_main.dta

*Figure C.1: Percent of Criminal Division Judges Who are Black and White, Annually
graph bar (mean) crb crw, over(init_year) ylabel(0 .2 "20" .4 "40" .6 "60" .8 "80") scheme(s1mono)

*Figure C.2: Percent of Criminal Division Judges Who are Black and White, Monthly
graph bar (mean) crb crw, over(initmo) ylabel(0 .2 "20" .4 "40" .6 "60" .8 "80") scheme(s1mono)

*Table D.1: Random effects Model of Racial Diversity and Incarceration
xtset ann_judgeid
xtreg doc perpropbl_chi black black#c.perpropbl_chi crw crh lat_hisp other age female  chargeclass_num retention last_year ytrend if leighton == 1, re vce (cl ann_judgeid)

*Table E.1: Gender Diversity and Incarceration
xtset ann_judgeid
xtreg doc perpropfem_chi female black lat_hisp other age chargeclass_num retention ytrend if leighton == 1, fe vce (cl ann_judgeid)
xtreg doc perpropfem_chi female c.perpropfem_chi#female black lat_hisp other age chargeclass_num retention ytrend if leighton == 1, fe vce (cl ann_judgeid)

*Table E.2: Judge Race and Incarceration Disparities 
reg doc black##crw lat_hisp other age female crh crfem j_pid last_year chargeclass_num retention ytrend if leighton == 1, vce (cl ann_judgeid)

*Table F.1: Racial Diversity (measured monthly) and Sentencing
xtset mo_judgeid
xtreg doc perpropbmon_chi black black#c.perpropbmon_chi lat_hisp other age female  chargeclass_num retention i.init_year if leighton == 1, fe vce (cl mo_judgeid)

*Table G.1: Racial Diversity v. Passing of Time
xtset ann_judgeid
xtreg doc c.perpropbl_chi##black##c.ytrend lat_hisp other age female chargeclass_num retention if leighton == 1, fe vce (cl ann_judgeid)

*Table H.1: Racial Diversity and Sentencing, Incorporating Previous Charges
xtset ann_judgeid
xtreg doc perpropbl_chi black black#c.perpropbl_chi lat_hisp other age female  chargeclass_num retention prev_charge ytrend if leighton == 1, fe vce (cl ann_judgeid)

*Table I.1: Number of Black Judges and Sentencing
xtset ann_judgeid
xtreg doc numbj_chi black black#c.numbj_chi lat_hisp other age female  chargeclass_num retention ytrend if leighton == 1, fe vce (cl ann_judgeid)

*Table J.1: Black Floor Mates and Sentencing
xtset ann_judgeid
xtreg doc totalbfl_chi black lat_hisp other age female chargeclass_num retention ytrend if (leighton == 1 & crw == 1), fe vce (cl ann_judgeid)
xtreg doc totalbfl_chi black c.totalbfl_chi#black lat_hisp other age female chargeclass_num retention ytrend if (leighton == 1 & crw == 1), fe vce (cl ann_judgeid)
xtreg doc totalbfl_chi black perpropbl_chi black#c.perpropbl_chi lat_hisp other age female chargeclass_num retention ytrend if (leighton == 1 & crw == 1), fe vce (cl ann_judgeid)

*Table K.1: Minority of One
/*Generating the dummy variable for the year of the floor switch: judge was on a floor with an equal number of Black and White judges through most of 2009, then moved to a floor where they were the only Black judge and all of the other judges were White.*/
gen post_2009 = 1 if init_year >2009
replace post_2009 = 0 if init_year < 2010
tab init_year post_2009
label variable post_2009 "After Floor Switch" 

gen j_min1 = 1 if ident_judge == 156
replace j_min1 = 0 if ident_judge != 156
replace j_min1 = . if ident_judge == .
label variable j_min1 "Judge Min. of One"

reg doc post_2009##j_min1 black lat_hisp other age female chargeclass_num retention if leighton == 1, vce(cl ann_judgeid)

*Table L.1: Racial Diversity and Sentencing in Harris County, TX
clear 
use HarrisCountyAnonymousData.dta

xtset judgeid
xtreg doc perpropbl black black#c.perpropbl other age female  chargeclass_num ytrend, fe vce (cl judgeid)
xtreg doc perpropbl black black#c.perpropbl other age female  chargeclass_num ytrend if j_black == 1, fe vce (cl judgeid)
xtreg doc perpropbl black black#c.perpropbl other age female  chargeclass_num ytrend if j_white == 1, fe vce (cl judgeid)



















